Inverse probability weighting to handle attrition in cohort studies: some guidance and a call for caution

Background Attrition in cohort studies challenges causal inference. Although inverse probability weighting (IPW) has been proposed to handle attrition in association analyses, its relevance has been little studied in this context. We aimed to investigate its ability to correct for selection bias in exposure-outcome estimation by addressing an important methodological issue: the specification of the response model. Methods A simulation study compared the IPW method with complete-case analysis (CCA) for nine response-mechanism scenarios (3 missing at random – MAR and 6 missing not at random - MNAR). Eighteen response models differing by the type of variables included were assessed. Results The IPW method was equivalent to CCA in terms of bias and consistently less efficient in all scenarios, regardless of the response model tested. The most effective response model included only the confounding factors of the association model. Conclusion Our study questions the ability of the IPW method to correct for selection bias in situations of attrition leading to missing outcomes. If the method is to be used, we encourage including only the confounding variables of the association of interest in the response model. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01533-9.

Page 2 of 15 Metten et al. BMC Medical Research Methodology (2022) 22:45 missing not at random mechanism (MNAR), it depends on the unobserved data. The simplest and most widely used approach to handle total non-response in cohort studies is complete-case analysis (CCA). This method assumes a MCAR mechanism and consists of studying the exposure-outcome association in the subset of respondents only. However, total non-response is generally considered to result from MAR or MNAR mechanisms. Several methodological publications have suggested the use of the inverse probability weighting (IPW) method in situations of the MAR mechanism of attrition [3,4]. It aims to recreate a representative sample of the initial cohort by differentially weighting the so-called "complete individuals" (i.e. those who participate in the survey wave under consideration). More precisely, when modeling the association between exposure and outcome, respondents are weighted by the inverse of their probability to participate (hereinafter referred to as the "response probability" or "probability of response"). This response probability depends on some of the respondent's characteristics. The use of the inverse of this probability implies that a respondent with a high probability of response (e.g. an individual with a high socio-economic level [5]) is given a comparatively lower weight in the analysis. The approach can be summarized as: "the respondents carry the weight of the non-respondents".
The probability of response is unknown and needs to be estimated from the data. The first step is therefore to build a response model (logistic regression model) to obtain weights that will be used in a second step in the association model. Because the association model is only fitted among respondents or complete individuals, the method is also called "weighted complete-case analysis" [2]. It is also referred to as "inverse probability of participation/attrition weighting" (IPPW/IPAW) in the literature [6][7][8].
Originally developed for reducing the effects of confounding in observational studies (propensity score method) [9], the IPW method was extended to correct for selection biases in situations of attrition. Although researchers have already adopted the method in association studies [10,11], guidance on its correct use is still needed, in particular regarding the specification of the response model (i.e. variables to be introduced into the response model).
In this article, we will focus on attrition resulting in a missing outcome of interest. This situation is commonly encountered in mother-child cohorts for example, where the effects of prenatal medical conditions or exposures on the future health of the children are studied. At the time point of interest (6 year old, for example), some children do not participate in the follow-up. Depending on the attrition mechanism (MAR, MNAR), restricting the analysis to the participating children may result in a biased estimation of the association between the exposure and the outcome.
Our work aimed at evaluating through simulations the ability of the IPW method to correct for a selection bias under various missingness mechanisms and specifications of the response model. Response model specifications were compared in terms of bias, variance and mean square error of the association estimates between the exposure and the outcome.
In all scenarios tested, we assumed that the exposure variable and the other covariates were fully observed at preceding waves or at baseline.

Which variables should be introduced into the response model?
Relatively few authors have addressed the question of which variables should be introduced in the response model from which the weighting is derived. In 2004, Hernan et al. recommended including the exposure variable and all variables that independently predict both response and outcome [3]. In 2013, Seaman and White advised not including variables that are exclusively related to the response without being related to the outcome and exposure variables. They suggested adding confounding variables (i.e. associated with both exposure and outcome) and prognostic variables (i.e. exclusively associated with the outcome) of the association studied [4]. Seaman and White's recommendations are consistent with simulation studies performed in the propensity score literature [12,13]. In this context, including variables that are related to the exposure but not the outcome is discouraged. Variables unrelated to the exposure but related to the outcome should instead be included. The literature on the IPEW method considers only two variables: the exposure and the outcome, whereas the IPPW method involves the response as a third variable. Thus, there is still uncertainty as to whether the inclusion of confounding and prognostic variables in the response model depends on whether or not they are associated with the response. Furthermore, if the exposure variable X is itself associated with the response, should it be included in the response model?
A quick glance at the applied literature shows that researchers usually build the response and association models independently. They often fit a logistic regression model by including the presumed predictors of response, whether or not they are related to exposure or outcome [14,15]. Their approach is thus primarily to fit a model that perfectly predicts the response and not to optimize the response model in relation to the association model.
None of the proposed strategies in the literature has been tested through simulations and they do not appear to be applied by researchers. Therefore, we propose here to provide insight on this issue by studying the impact of the type of variables included in the response model on the bias and variance of the exposure regression coefficient in the association model.

Simulation study
We conducted a Monte-Carlo simulation study under several MAR and MNAR scenarios. We aimed to evaluate i) the relative performance of the IPPW method relative to CCA and ii) how the specification of the response model in the IPPW method affects the bias of the exposure regression coefficient β , its variance and mean square error, and the coverage rate of confidence intervals.
The SAS code to implement the simulation study is available in Additional file 1: e-Appendix 1.
We focused on the case of a linear regression model in which a continuous outcome is explained by continuous exposure and covariates.

Data-generating process
We first created a sample of size n = 1,000, containing seven covariates z 1 , …, z 7 generated independently according to standard normal distributions. We then generated an exposure variable according to the following model: where ϵ i is generated according to a standard normal distribution. In the exposure model (1), the coefficients were chosen as α 1 = α 2 = α 5 = α 6 = 0.218, so that the correlation between x i and each of the covariates z i was approximately 0.2. We generated an outcome variable according to the following model: where ϵ′ i is generated according to a standard normal distribution. In the outcome model (2), the coefficients were chosen as β =0.5, β 1 = β 5 = 0.170, and β 3 = β 7 = 0.230, such that the correlation between y i and x i was approximately 0.3,and the correlation between y i and any of the covariates z i was approximately 0.2. Finally, we generated response probabilities according to the following logistic model: We used the values γ y = 0.0, 0.2 or 0.5 and γ x = 0.0, 0.2 or 0.5. The case in which γ y =0.0 corresponds to a MAR situation (i.e. the response probability does not depend (1) on y i ). The cases in which γ y =0.2 and 0.5 correspond to MNAR situations (i.e. the response probability depends on y i ). In the response model (3), the coefficients γ 1 , γ 2 , γ 3 , and γ 4 were chosen to be equal to 0.1. The coefficient γ 0 was chosen such that the average response rate was approximately 60% for all cases. In the sample, the individuals responded independently with the probabilities p i . The data-generation model is presented in Fig. 1 and the nine response mechanism scenarios are summarized in Table 1.

Simulation parameters and performance criteria
We compared the IPPW method to CCA for a parsimonious association model, i.e. including only the confounding variables Z 1 and Z 5 , which corresponds to standard epidemiological practice: . Several response models were tested (see Table 2) to determine the impact of the type of variables included on the β regression coefficient of the exposure variable and its variance in the association model. Briefly, we first evaluated the "well-specified" response model, i.e. the one that included all the variables really related to the response (X, Z 1 , Z 2 , Z 3 , Z 4 ), as simulated (Eq. 1). We also initially included the exposure variable X, although this variable was not associated with the response in certain tested scenarios (MAR 1, MNAR 1, MNAR 4). We then tested a model including all available variables. Then, we assessed the proposals by Hernan et al. (2004) and by Seaman and White (2013), described above [3,4]. Finally, we evaluated parsimonious strategies: including only the confounding variable associated with the response, including only the confounding variable not associated with the response, including both, including both with the addition of a prognostic variable not associated with the response, and finally, including both confounding and prognostic variables not associated with the response. All these response models were then evaluated without the exposure variable X.
The generation of the sample and variables was repeated B = 10,000 times. For each sample, we computed the β regression coefficient and its variance according to the 18 possible response models. The simulations were conducted using SAS version 9.4.
The results were assessed according to the following criteria: • The Monte Carlo bias: • The Monte Carlo variance:  Table 1   Table 1 Response mechanism scenarios (data generation) γ k :regression coefficients of the generated response models (logit(p i ) = γ 0 + γ y y i + γ x x i + γ 1 z 1i + γ 2 z 2i + γ 3 z 3i + γ 4 z 4i ) Scenario • The mean square error: • The relative root mean square error: The Monte Carlo variance is the variance of the estimates over 10,000 replications. Therefore, it accounts for the entire variability of the estimators, including the fact that the weights are estimated. We have also computed the coverage rates for the normality-based confidence intervals for β x , with nominal rates of 2.5% in each tail.

Results of the simulation study
The simulation results are reported in Tables 3, 4 and 5 for the Monte Carlo bias, variance, mean square error and relative root mean square error, and in Table 6 for the coverage rates.

Bias in the β regression coefficient
We observed no bias with either CCA or the IPPW method for the three MAR scenarios and MNAR scenario 1 (γ x = 0.0, γ y = 0.2). A bias occurred with both methods for the five other MNAR scenarios, with a greater amplitude for MNAR scenarios 5 (γ x = 0.2, γ y = 0.5) and 6 (γ x = 0.5, γ y = 0.5). The bias was globally equivalent between CCA and the IPPW method for these five scenarios. Within the IPPW method, all response models tested showed the same bias pattern across all MAR and MNAR scenarios. For MNAR scenarios 1 to 3, the absolute bias increases as γ x increases. Similarly, the absolute bias increases as γ x increases for MNAR scenarios 4 to 6.

Variance of the β regression coefficient
The IPPW method was less efficient than CCA for all scenarios. We observed an increase in variance with increasing correlation between the exposure variable X and the response (illustrated by Figs. 2 and 3). The loss of efficiency of the IPPW method was thus particularly pronounced in MAR scenario 3 and MNAR scenarios 3 and 6 (all three characterized by γ x = 0.5).      The response models that enabled a reduction in the variance were those in which the exposure variable X was removed (see Figs. 2 and 3). This was particularly observed in scenarios in which the exposure variable X was associated with the response, but it was also observed in MAR scenario 1 and MNAR scenarios 1 and 4 (all three characterized by γ x = 0.0). Among the response models without the exposure variable X, the response model that further reduced the variance was that which included only the variable Z 5 (confounding variable not associated with the response). Nevertheless, response models including the variables Z 5 , Z 7 (a confounding variable and a prognostic variable, neither associated with the response), and only Z 1 (confounding variable associated with the response) also showed good performance in terms of precision. Overall, the gain in precision obtained with all these response models did not enable us to reach the level of precision obtained using CCA.

Mean square error of the β regression coefficient
For the MAR scenarios, all the tested estimators are unbiased and there is therefore no difference between the variance and the mean square error (see Table 3). For MNAR scenarios 1 to 3, the mean square error increases with γ x , i.e. when the correlation between the exposure variable and the response increases. This also holds true for MNAR scenarios 4 to 6.

Coverage rates
The coverage rates are well respected for all the MAR scenarios and for MNAR scenario 1. For MNAR scenarios 1 to 3, the coverage decreases as the correlation between the exposure variable and the response increases. This also holds true for MNAR scenarios 4 to 6. The coverage rates are poorly respected for MNAR scenarios 5 and 6.

Illustrative example
As an example, we analyzed the association between prepregnancy maternal BMI with the child's BMI at age 7 in TIMOUN, a prospective mother-child cohort study conducted in the Guadeloupe archipelago (French West Indies) [16].

Study population and data collection
Between November 2004 and December 2007, 1068 pregnant women were enrolled in TIMOUN by Table 6 Simulation study results: coverage rate of the normality-based confidence interval for the β regression coefficient for CCA and the IPPW method (18 response models), for nine response mechanism scenarios   Table 2) for the nine response mechanism scenarios. The variance increased as the correlation between the exposure variable and the response variable increased for both methods. The variance was consistently higher with the IPPW method than with CCA in all scenarios. With the IPPW method, variance inflation was particularly observed when the exposure variable X was put into the response model Fig. 3 Monte-Carlo variance obtained with CCA and the IPPW method (response models 6 and 15, see Table 2) for the nine response mechanism scenarios. The variance increased as the correlation between the exposure variable and the response variable increased for both methods. The variance was consistently higher with the IPPW method than with CCA in all scenarios. With the IPPW method, variance inflation was particularly observed when the exposure variable X was put in the response model. On the other hand, removal of variable X (covariate Z 5 only) resulted in the variance obtained with the IPPW method being very close to that obtained by CCA obstetricians during their second-or third-trimester prenatal visit at public hospitals or at a local dispensary. At inclusion, women were interviewed by trained midwives to assess their medical history, socioeconomic conditions, and lifestyle. At birth, information concerning maternal diseases during pregnancy, health status of the newborn, and details of the delivery was also collected [17]. In total, 1033 single live births were registered. Several followups were organized within a selected subsample of the children at 3, 7, and 18 months of age [18,19]. When the children were 7 years of age, all the mothers initially included were invited to participate in a new follow-up which consisted of an interview of the mothers and a medical examination of the children. Among the 1033 mother-child couples initially included, 592 participated in this second wave, representing 57% of the initial sample. Weight was not measured for two children examined at age 7, resulting in a final population of 590 for the association studied (see detailed flow-chart in Additional file 1: e-Appendix 2).

Outcome and exposure
The exposure of interest was the pre-pregnancy maternal BMI (kg/m 2 ). It was calculated from the mothers' selfreported weight and height before pregnancy at inclusion in the cohort. The outcome of interest was the child's BMI at 7 years. It was calculated from measurements performed during a medical examination at 7 years.

Covariates
The covariates considered in the analysis were maternal age at birth (continuous), maternal educational level (< 5 years, 5-12 years, > 12 years), maternal place of birth (French West Indies, other Caribbean island, Europe), non-gestational maternal diabetes (yes, no), enrollment site (university hospital, local hospital, antenatal care dispensary), maternal alcohol consumption during pregnancy (yes, no), maternal smoking during pregnancy (yes, no), sex of the child (boy, girl). The proportion of missing data within these covariates did not exceed 3%, except for maternal alcohol consumption during pregnancy (5.6%). For the variables with missing values, a single imputation by the modal value was previously performed.
The directed acyclic graph (DAG) on which we based our analyses is presented in Fig. 4. All arrows were placed according to a priori knowledge. In our study, the DAG approach did not identify all the types of covariates Z presented in the simulation study: no variables of type Z 3 , Z 5 , or Z 6 were present in our example. In this didactic example, we assume a situation equivalent to the MAR 1 scenario in the simulation study (i.e. the response at 7 years depends neither on the exposure nor the outcome, but only on the covariates).
A linear regression model was fitted with an a priori adjustment for maternal education and maternal place of birth (confounding variables). Both CCA and the IPPW method were applied, the latter using several response models. Fig. 4 Directed acyclic graph (DAG) of the known or assumed associations between variables of the illustrative example. For the sake of simplicity and clarity, the arrows representing the associations between the covariates are not drawn. Not all types of variables considered in the simulation study were suitable for this illustrated example. The covariates 'maternal educational level' and 'maternal place of birth' were considered to be confounding factors in the relationship between pre-pregnancy maternal BMI and child BMI at age 7. The association models were therefore adjusted for these covariates The analyses were performed using R 3.3.2 (R Foundation for Statistical Computing, Vienna, Austria). The standard errors were computed taking into account the weight estimation phase, according to the method described by Metten et al. [20].
The R code to implement the CCA and IPPW analyses and a training dataset are available in Additional file 1: e-Appendix 3.

Results
The β coefficients related to the exposure of interest were very similar between CCA and the IPPW method (Table 7). Within the IPPW results, the most effective response model strategy was the one including only Z 1 variables (maternal educational level and maternal place of birth).

Discussion
Attrition is a major methodological issue in cohort studies. It challenges the validity of association analyses because its occurrence is generally not completely at random. Several authors have proposed the IPPW method to correct for potential selection biases [3,4]. However, little evaluation of the method has been performed and there is little guidance for researchers who wish to apply it, in particular for the specification of the response model.
Our simulation study showed no superiority of the IPPW method over CCA in terms of bias, and it even led to a loss of efficiency. Both were similarly unbiased in the MAR scenarios and similarly biased in most MNAR scenarios. For the MNAR scenarios, the absolute bias increased as the correlation between the exposure and the response increased. As a result, the mean square error is high for these scenarios when γ x = 0.5. In addition, because the bias is negative, the confidence intervals are shifted to the left and the nominal error rates are poorly respected.
These results are consistent with those observed in a study comparing several methods of handling attrition in a simulated cohort of 300 subjects [21]. In this study, the authors concluded that CCA produces results as valid as those obtained with the other compared methods, which included the IPPW method. It is worth noting that the IPPW method consists in reweighting the study population with complete data, meaning that both CCA and IPPW methods are based on the same sub-population. Therefore, a difference in efficiency cannot be attributed to a varying sample size. One explanation for the loss of efficiency observed with the IPPW method lies in the fact that adding covariates in the response model tends to increase the variability of estimated weights.
We chose to solely compare the IPPW method to CCA. However, there are also other approaches, including imputation methods, which consist of replacing missing values with plausible ones. Multiple imputation (MI) is an advanced imputation method that has steadily improved and gained popularity in recent years [22,23]. It consists of imputing the dataset several times by using adapted models that include the collected variables. However, imputation methods are mainly used for missing covariates in situations of partial non-response. Seaman and White emphasized that it may be potentially dangerous to use MI in situations of total non-response [4]. The risk of mis-specifying the imputation model would be high because it requires the imputation of all missing variables of a given survey wave, without auxiliary information at the time of the survey wave. The results of Lewin et al. also showed that MI was no better than CCA in situations of attrition that lead to a missing outcome [24]. This is also consistent with the findings of Kristman et al. [21].
Our study aimed also to assess the impact of the choice of the variables included in the response model on the bias of the exposure regression coefficient and its variance. The various response models tested did not change the bias patterns, which is consistent with what has been observed in the literature on the propensity score method. Indeed, Brookhart et al. showed that the issue of the choice of variables resided essentially in the variance, not in the bias [12]. The strategy of not including variables associated with the exposure in the propensity score, but rather confounding and prognostic variables, improved the precision of the estimates without increasing the bias.
In our study, we show that it is preferable not to include the exposure variable in the response model. Otherwise variance inflation would be observed, which is not in line with the proposal of Hernan et al. [3]. Paradoxically, this phenomenon was particularly pronounced in scenarios in which the exposure variable was associated with the response. This can possibly be explained by over-fitting because the exposure variable is present in the association model. Within response models without the exposure variable, the minimalist strategy, consisting of including only the confounding variable unrelated to the response, resulted in the lowest estimated variance. Close response models (in order of best precision: inclusion of both confounding and prognostic variables unrelated to the response; inclusion of the confounding variable related to the response) also performed well in terms of precision. Thus, parsimonious strategies using the same variables as the association model (except the exposure variable) were the most effective. This was also observed in our illustrated example. The strategy to optimize the response model when using the IPW method to limit a selection bias (IPPW) is thus the same as that recommended in situations in which the IPW method is used to limit a confounding bias (propensity scores).
The strategy for constructing the response model requires clear identification of the role played by the variables. This can be based on a structural approach using DAGs, as we did in our example in Section 5 [3]. DAGs are causal analysis tools originally designed to assist in the selection of variables in an association model [25]. They make it possible to control for a confusion bias and avoid over-adjustment in the association model. Within the framework of the propensity score method, Austin and Stuart recommended using them to identify sets of variables to be included in the propensity score [26]. Similarly, it can be useful to guide the variable selection in the response model. Indeed, it allows researchers to better visualize the relationships between all the variables involved in the association of interest (exposure, outcome, response, covariates) and thus enables optimization of the specification of the response and association models.
DAGs are based on a priori knowledge and thus do not protect against misidentification of the role played by the variables. In surveys, rather than weighting individuals by their individual probability of response, the sample is often partitioned into response homogeneity groups (RHGs), i.e. groups that are homogeneous in terms of response probability. The parameters of interest are estimated in each group and then pooled across the groups to obtain an overall parameter. This strategy, while improving the precision of the estimates, protects against possible misspecification of the response model. The RHG method is quite similar to what is referred to as stratification in the context of the propensity score used to reduce a confounding bias. In the context of attrition, Seaman and White proposed a stratified IPPW method, but the stratification was not based on response probability but rather response patterns to survey waves [4]. Once the strata were defined, a response model was fitted independently in each stratum. We are not aware of any use of stratification on the probability of response in association studies based on cohort data, but this may be a new application of an existing method in other contexts (surveys, and propensity scores).

Strengths
The first strength of our study is that we tested through simulations nine response mechanism scenarios, corresponding to three degrees of correlation between the response variable and our interest variables (exposure, outcome). The parameters chosen were consistent with those observed in the TIMOUN cohort to represent a realistic setting. In addition, we evaluated the impact of several response models on the estimated exposure effect. This has not been previously performed in the literature when the IPW method has been used to reduce a selection bias.

Limitations
This study also had limitations. First, our simulation framework did not consider binary outcomes, although this is a common situation in epidemiology. However, recent literature indicates that CCA is potentially much less prone to give biased estimates of the exposure coefficient in a logistic regression [27], making the linear regression framework more challenging for evaluating the IPPW method. Second, in our simulations the level of attrition was kept constant, at a quite high but realistic level (40%) for cohort studies. The influence of the attrition rate is however well known, with the expected conclusion that bias and variance increase with the percentage of non-respondents [21]. Moreover, the attrition level we chose was realistic for cohort studies and thus quite high (40%). Third, we did not vary the degree of correlation between the covariates Z and the response or our variables of interest (exposure, outcome). However, Lewin et al. showed that a strong correlation between the outcome and a variable of type Z 3 (i.e. associated with the outcome and response) could increase the magnitude of the bias [24]. Finally, our study only considered attrition leading to missing outcomes and fully observed exposure and covariates. However, these variables may also be affected by partial non-response in everyday practice. Seaman proposed a mixed approach to address this problem, combining MI and the IPPW method [28]. Although it has already been used by several epidemiological researchers [29][30][31], such an approach should be further explored, especially for the evaluation of its superiority over a method combining IPPW and single imputation. Finally, we did not address the consequence of using estimated weights (inverse response probability) in the association models. The usual statistical packages and procedures estimate the standard error of the effect of exposure as if weights are a priori known, ignoring the extra-variability due to their estimation. Consequently, the standard error is biased and may mislead the conclusions about the significance of the effect. We proposed an exact estimation of the variance (linearized variance) that should be used when IPW is implemented (as we did in our illustrative example). The details of the calculations of this variance are available in Metten et al. [20].

Conclusion
Our study suggests that using IPPW to handle attrition in cohort studies does not reduce bias and may result in a loss of efficiency. These results therefore raise questions about the contribution of the IPW method to correcting possible selection bias that occurs in situations of attrition that lead to a missing outcome in association analyses. If the method is to be used, we encourage use of only the confounding variables of the association of interest in the response model.